Fast Generation of Accurate Synthetic Microdata
نویسندگان
چکیده
Generation of a synthetic microdata set that reproduces the statistical properties of an original microdata set is a promising approach to statistical disclosure control (SDC) of microdata. In this paper, a new method for generating continuous synthetic microdata is proposed. The covariance matrix and the univariate statistics of the original data set are exactly preserved. The method is non-iterative and its complexity grows linearly with the number of records to be protected.
منابع مشابه
Post-Masking Optimization of the Tradeoff between Information Loss and Disclosure Risk in Masked Microdata Sets
Previous work by these authors has been directed to measuring the performance of microdata masking methods in terms of information loss and disclosure risk. Based on the proposed metrics, we show here how to improve the performance of any particular masking method. In particular, post-masking optimization is discussed for preserving as much as possible the moments of first and second order (and...
متن کاملInformation Loss in Continuous Hybrid Microdata: Subdomain-Level Probabilistic Measures
The goal of privacy protection in statistical databases is to balance the social right to know and the individual right to privacy. When microdata (i.e. data on individual respondents) are released, they should stay analytically useful but should be protected so that it cannot be decided whether a published record matches a specific individual. However, there is some uncertainty in the assessme...
متن کاملDevelopment of Synthetic Microdata for Educational Use in Japan
Japan’s new Statistics Act has come fully into effect in April 2009. The new law allows access to Anonymized microdata, and at the same time it requires users to go through an application process and imposes some restrictions. The National Statistics Center (NSTAC) has developed a type of microdata which can be accessed without an application process and used without restrictions. These data do...
متن کاملMicrodata Protection
Governmental, public, and private organizations are more and more frequently required to make data available for external release in a selective and secure fashion. Most data are today released in the form of microdata, reporting information on individual respondents. The protection of microdata against improper disclosure is therefore an issue that has become increasingly important and will co...
متن کاملSynthetic Data Generation using Benerator Tool
Datasets of different characteristics are needed by the research community for experimental purposes. However, real data may be difficult to obtain due to privacy concerns. Moreover, real data may not meet specific characteristics which are needed to verify new approaches under certain conditions. Given these limitations, the use of synthetic data is a viable alternative to complement the real ...
متن کامل